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ABSTRACT 

We outline a strategy to select faint (iab < 24. 5) type 1 AGN candid ates down to the Seyfert/QSO boundary 
for spectroscopic targeting in the COSMOS field (Sco ville et alf 2007). Our selection process picks candidates 
by their nonstellar colors in uBVRizK broadband photometry from the Subaru and CFH Telescopes and 
morphological properties extracted from HST ACS i band data. Although the COSMOS field has been used 
extensively to survey the faint galaxy population out to z ~ 6, AGN optical color selection has not been applied 
to so faint a level in such a large continuous part of the sky. Hot stars are known to be the dominant contam- 
inant for bright AGN candidate selection at z < 2, but we anticipate the highest color contamination rate at 
all redshifts to be from faint starburst and compact galaxies. Morphological selection via the Gini Coefficient 
separates m ost potential AGN from these faint blue galaxies. Recent models of the quasar luminosity function 
(QLF) from Hopkins et al] £2007) are used to estima te quasar surface densities, and a recent study of stellar 
populations in the COSMOS field dRobin et alj|2007l) is applied to infer stellar surface densities and contami- 
nation. We use 292 spectroscopically confirmed type 1 broad line AGN and quasar templates to predict AGN 
colors as a function of redshift, and then contrast those predictions with the colors of known contaminating 
populations. Since the number of galaxy contaminants cannot be reliably identified with respect to stellar and 
predicted QLF numbers, the completeness and efficiency of the selection cannot be calculated before gathering 
confirming spectroscopic observations. Instead we offer an upper limit estimate to selection efficiency (about 
50% for low-z and 20-40% for int-z and high-z) as well as the completeness and efficiency with respect to an 
X-Ray point source population (from the COSMOS AGN Survey), in the range 20% to 50%. The motivation 
of this study and subsequent spectroscopic follow up is to populate and refine the faint end of the QLF, at 
both low and high redshifts, where the population of type 1 AGN is presently not well known. The antici- 
pated AGN observations will add to the —300 already known AGN in the COSMOS field, making COSMOS 
a densely packed field of quasars to be used to understand supermassive black holes and probe the structure of 
the intergalactic medium in the intervening volume. 

Subject headings: quasars general — galaxies: luminosity function — galaxies: active — surveys — COSMOS 



1. INTRODUCTION 

Optical colors provide a well-developed, reliable astronom- 
ical selection technique for stellar and galaxy populations. 
The method was first applied to AGN in the 1960's, based on 
the inference that quas ars often have a larger ultravi olet excess 
than the hottest stars (Sanda ge & Wvndhamlll965l) . Subse- 
quen t large-scale surveys hav e taken up the sear c h for quasars 
(e.g. ISchmidt & GreerJll983t iFoltz et alJll987t ICroom et alJ 
2001; Schnei der etal J l2007h . causing the known population 
to grow dramatically. The ongoing search to find new quasars 
is highly motivated by their use in probing the intergalactic 
medium (IGM) and understanding the nature of supermassive 
black holes. To efficiently target and identify new quasars, op- 
tical selection techniques have proven to be highly efficient, 
in some cases mitigating the need f o r confirming sli t spec- 
troscopy dRichards et alj |2002. 200i. iRichards et ail d2002l) 
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used multi-color imaging from the Sloan Digital Sky Sur- 
vey (SDSS) to select AGN and quasars down to magnitudes 
%ab < 21. In this paper we appl y optical selection to the 
COSMOS field dScoville et alj|2007l) . probing the AGN pop- 
ulation to much fainter magnitudes {iab < 24.5) than any 
previous large-area survey, and we reveal challenges unique 
to the fainter AGN population and its contaminants. To prop- 
erly account for contamination of the AGN candidate pool, 
we characterize the stellar populations that are dominant at 
iab < 21 and the galaxy population that are more prevalant 
at fainter magnitudes. 

Targeting the AGN population to such a faint level is key to 
understanding bulk properties of AGN and constraining the 
faint end of the quasar luminosity function (QLF) at high red- 
shift, which is highly unknown and can vary in up to two 
orders of magnitude at i > 23 (e.g. pure luminosity evolu- 
tion vs. lumi nosity dependent density evolution mo st recently 
presented in Hopkins, Richar ds. & Hern quist 2007). With a 
more complete QLF, astronomers can analyze the nature of 
low luminosity quasars further — answering important ques- 
tions about their host galaxies and environments. Such ob- 
jects are also useful for interpreting the low mass end of the 
black hole M-cr relation, and for probing the IGM. A partic- 
ular goal of observing faint AGN is to measure the growth 
rate of lower mass black holes and/or AGN that accrete with 
lower efficiency. This faint survey brings quasar selection into 
a new regime of luminosity, placing new observational bounds 
on theoretical ideas about the nature and evolution of quasars. 
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IRichards et al. (2004) used two 3D multi-color spaces to se- 
lect QSO candidates in SDSS: ubri (u — b } b — r,r — i) for 
lower redshift candidates and briz (b—r, r—i, i—z) for candi- 
dates with z > 2.5. The AGN population with 2.5 < z < 3.0 
is well known to exhibit colors similar to A s tars and thus is 
extremely difficult to isolate via optic al means ( Richar d^ et alJ 
120021; iFanll 19991: IRichards et al.ll2001l) . justifying a split of the 
selection algorithm into high and low redshift components. 
Following the SDSS group, we use a u — B baseline for 
z < 2.5 and a combination of (B — V) and (V — i) colors 
to target AGN with z > 3.0. The intermediate redshift range 
(2.5 < z < 3.0) is also targeted and follows similar selection 
criteria as the high redshift selection, but we expect a much 
lower object yield in this range due to heavy contamination 
from faint blue stars. Unlike the SDSS group, we do not an- 
ticipate recovering the AGN population with equal efficien- 
cies across all redshifts. Our goal is to push AGN selection 
to fainter magnitudes while maintaining reasonable efficiency 
(> 20%) and completeness (> 30%). Unique to our survey 
is the use of morphological information (via the ACS images) 
to separate the marginally resolved AGN galaxies and unre- 
solved stars (only prominent in number at the brighter magni- 
tudes) from more clearly resolved galaxies. 

Since the goal of this study is to realistically constrain the 
faint AGN population, we hope to target a significant portion 
of our AGN candidates during future spectroscopic ovser- 
vations, anticipating anywhere from 30-50 AGN yeild per 
night by observing ^100 candidates, as well as gaining im- 
portant information from the spectroscopic details of contam- 
inating galaxies. Already, ~160 candidates, chosen by the 
methodology of this paper, have been observed at Magellan 
IMACS and LDSS3 as of May 2007 and more observations 
are planned. By building up significant statistics on low- 
luminosity AGN in COSMOS, such a large swath of the sky, 
we are in a unique positions to improve what is known about 
the AGN population with meaningful statistics at the limit of 
current observations. 

This paper thoroughly discusses the development of a reli- 
able optical AGN selection algorithm, current knowledge of 
the QLF, and estimates of our algorithm's efficiency and com- 
pleteness; observations and further development of the QLF 
will be discussed in a follow-up paper. The catalogs and data 
used in the development of a selection algorithm are discussed 
in $2] The colors and nature of the contaminating populations 
are discussed in |J3] while our method of morphological se- 
lection is given in Sj4] The specifics of the AGN selection are 
detailed in £0 In ij6]we discuss the current picture of the QLF, 
predict number counts of contaminating populations, and dis- 
cuss estimates to the efficiency of our methodology. We use a 
standard cosmology with VLm = . 3, = 0.7, andiJo = 70 
km s _1 Mpc _1 (e.g. Sner gel et al.ll2003|) a nd luminosity dis- 
tances computed according to lHogd(11999l) . 

2. CATALOGS AND TRAINING DATA 

AGN candidates were selected from the overlap of two cata- 
logs: the COSMOS photometric catalog (hereafter CPC) from 
ICapak et alJ d2007| ) and the COSMOS HS T Morphology Cat- 
alog (CMC) from lAbraham et all (12004 120071) . The former 
contains photometric information in uBVrizK broadband fil- 
ters and photometric redshifts for 3,234,836 objects in the ex- 
tended 3.5 square degree Subaru optical field, and 2,326,609 
objects in the central 1 .7 square degrees covered by Hubble 
ACS imaging. The central 1.7 square degrees is fully imaged 
with the F814W ACS filter and the resulting catalog is 95% 
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FIG. 1 . — Differential distribution of iab apparent magnitude (in a bin size 
of 0.05 mag) for all objects in the COSMOS photometric (dashed) and mor- 
phological (solid) catalogs. The larger COSMOS Photometric Catalog (CPC) 
contains over 2 million objects in the central ~2 square degrees (~ 90 % of 
which have iab > 24.5) and becomes incomplete fainter than iab = 25.9 
(dotted line). The morphological catalog (the subset of the CPC defining our 
candidate pool) contains about 195,000 objects, constrained approximately 
by 18 < %ab < 24.5 and exactly by 18 < iacs < 24.5. As discussed in 
the text, iauto is used throughout the paper to denote iab because it most ac- 
curately represents the full integrated flux in the standard AB-system i band. 
Within the magnitude range of interest (18 < %ab < 24.5), about 80% of 
all CPC objects are also contained in the COSMOS Morphological Catalog 
(CMC); the discrepancy lies largely between magnitudes 23.5 and 24.5. 

complete down to iacs — 26.0. The Morphological Catalog 
(CMC) includes detailed 2D morphology for 195,706 objects 
restricted by 18.0 < iacs < 24.5, which is ~80% of all 
CPC objects within the same magnitude limits. For a plot of 
the differential number counts, see Figure[T] For reasons elab- 
orated on later (see ^3 . 3b we do not choose to present a purely 
color dependent algorithm to select AGN candidate objects 
excluded from the CMC (but part of the CPC), due mainly to 
heavy galaxy contamination and large increase in photometric 
errors. 

In addition to the selection catalogs, we consider a "train- 
ing set" of kn own AGN in the C OSMOS field with confirming 
spectroscopy (Trump et al. 2007). To model the complex na- 
ture of AGN selection, we use the AGN training set, along 
with four ty pe 1 AGN color temp lates adapted from SEDs 
presented by iBudavari et alJ d2001l). A recent a nalysis of the 
COSMOS stellar population jRobin et al1l2007h is used to es- 
timate contamination levels from stars after establishing the 
algorithm. A list of 1073 X-Ray point sources (hereafter 
XRPS) with no spectroscopy are used to test algorithm effi- 
ciency after the method design has been explained. 

2.1. COSMOS Photometric Catalog 

Data are drawn from the 3 Jan 2006 data release of the Cos- 
mic Evolution Survey (COSMOS) ^2 square degree equato- 
rial field imaged with large ground-based telescopes (Subaru, 
VLA, ESO-VLT, UKIRT, NOAO, CFHT) and space-based 
observatories (Hubble, Spitzer, Galex, XMM, Chandra). The 
latest release of the photometry catalog (CPC) includes de- 
tections for over 3 million objects in the Subaru i+ band filter 
in an extended 3.5 square degree field (the offset to SDSS 
i is +0.3 magnitudes), and magnitudes in CFH u*,i* (here- 
inafter denoted u, i c ), Subaru Bj,Vj,r+,i+,z+ (denoted 
BVriz), Kitt Peak CTIO Ks, narrow-band Subaru NB816, 
and F8UW HST ACS band (iacs)- The CPC's main use 
by the COSMOS collaboration has been to survey the galaxy 
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TABLE 1 
Limiting and Completeness 
Magnitudes for Cosmos Photometry. 



BAND 


LIMITING 


COMPLETENESS 


U C 


26.3±0.4 


27.1 ±0.1 


B S 


25.6±0.4 


26.7±0.2 


V S 


25.6±0.5 


26.6±0.2 


Rs 


25.7±0.4 


26.2±0.1 


is 


24.6±0.5 


25.9±0.2 


la 


23.1 ±0.4 


25.2±0.1 


z s 


23.6±0.6 


25.4±0.2 


K K 


20.1±0.3 


22.9±0.1 



NOTE. — The limiting magnitude is de- 
fined by the faintest magnitude (M) of which 
crj^f/M <10%. The completeness magnitude 
is the magnitude at which the given band is 
95% complete. The 'C subscript corresponds 
to CFHT photometry, 'S' corresponds to Sub- 
aru, and K' corresponds to Kitt Peak CTIO. 

population, of whic h over 2 million galaxies have been de - 
tected out to z ~ 3 dScoville et alj|2007t ICapak et al.ll2007l) . 
Imaging in F814W with Hubble ACS provides sufficiently 
deep data for reliable morphological class ification down to 
Mcs > 24.5, described more fully in £ 12.21 

Photometric redshi fts are estimated via t wo methods— the 
COSMOS team code dMobasher et al.l2006l) and t he Baysian 
Photometric Redshift (BPZ) code dBenitezH l999). The dis- 
persion in photometric redshifts is comparable and small in 
either case (a{z)jz ~ 0.04), but the Mobasher code mea- 
sures reddening and does a better job of breaking redshift de- 
generacies. Although the photometric redshifts are effective 
for Hubble-typing galaxies, they are clearly inappropriate for 
our AGN candidates, which have complex, multi-component 
spectra not easily characterized by SED fitting based on stel- 
lar populations. Neither photometric redshift code uses AGN 
templates. We will use the photometric redshifts to quan- 
tify galaxy color properties and understand the contamina- 
tion rates as a function of redshift, for which both methods 
(Mobasher and BPZ) are reliable and produce similar results. 

The photometric catalog quotes a detection band mag- 
nitude, i au to, which defaults to Subaru magnitudes in i+ 
except in the case where the source is saturated or miss- 
ing in the Subaru image and CFHT i* is used instead. 
The subscript auto refers to the SExtractor AUTO aperture 
used to ca lculate magnitudes insi de an adjustable, elliptical 
isophote dBertin & Arnoutslll996l) . CFHT magnitudes dom- 
inate iauto < 20.1 (these are saturated sources in Subaru 
photometry), and constitute a smaller population of objects 
at fainter magnitudes, out to 24. For objects with photom- 
etry in both bands, the CFHT and Subaru magnitudes are 
consistent out to i = 26.0, fainter than the AGN candidates 
which are limited by the depth of the morphological catalog 
(iauto ~ 24.5). With this understanding, we will not dis- 
tinguish between them and will operate in terms of appar- 
ent magnitude iab, which throughout this paper will refer to 
iauto- All other magnitudes in the catalog are calculated us- 
ing a SExtractor fixed aperture with 3" diameter and are only 
used when discussing colors. 

Since optical AGN have an intrinsic spread in their spectral 
energy distributions (SEDs), the difficulty in selecting can- 
didates is aggravated by photometric errors. For each band 
that we use during object selection, we quote two character- 
istic limiting magnitudes: the first is the magnitude at which 



the error is 10%, and the second is the 95% limit of catalog 
completeness. Table Q] shows these magnitudes for each fil- 
ter. Although the deepest band in the catalog is B, and the 
other bands progressively become shallower at redder wave- 
lengths, i+ is chosen as the detection band image because it 
does not bias against higher redshift objects (except at z>5), 
is not effected sustantially by reddening, is the deepest red 
band, and is typically used as the detection band for large op- 
tical surveys . A y 2 band (a co addition of i band, r band and 
B band; see lCapak et al.| [2007) has increased sensitivity and 
pan-chromatic advantage and was also considered as a detec- 
tion band; however, the i+ band gives much better resolution 
needed for high quality photometric calculations. 

2.2. COSMOS Morphological Catalog 

The COSMOS HST morphological catalog (CMC), gen- 
erated by Bob Abr a ham at the University of Toronto 
dAbraham et all 12004 120071) . uses single filter ACS imag- 
ing to extract 2D morphology classification down to iAcs < 
24.5. The CMC is primarily designed for use in studying the 
morphological properties of galaxies in the COSMOS field, 
using 2D parametric and non-parametric measures. In the 
special version fo the CMC used in this project, morpholo- 
gies were calculated down to a level too faint for reliable 
galaxy work, but were enabled by the fact that AGN are gen- 
erally described by a point source surrounded by a fainter host 
galaxy. The catalog is not taken to fainter levels because the 
robustness for even basic morphological calculations deteri- 
orates. The CMC includes ACS magnitudes (total AUTO 
magnitude), orientation, ellipticity, mean surface brightness, 
central surface brightness, half light radius, signal to noise ra- 
tio, concentration index, and the Gini coefficient (among other 
parameters not used in this study). Since there is a significant 
color correction applied between F8 14W and iab, there is not 
a clean cutoff at i = 24.5 for CMC sources in FigureQ] Spuri- 
ous sources from both ground based and ACS data, especially 
in the wings of bright sources, causes the tail of few objects 
out to magnitude i = 26 which should normally be included 
in the 18 < i < 24.5 range. 

The Gini coefficient, hereinafter denoted G, is a non- 
parametric measure of concentration (0 < G < 1, with G = 1 
is a point source with all the flux in one pixel, and G = is 
uniformly extended with no discernable center) which doesn't 
assume a central pixel or a PSF The advantage of its use 
is that it can morphologically characterize galaxies of arbi- 
trary shape and does not require a well-defined nucleus cen- 
ter, which is a more general treatment of PSF classification of 
stars and galaxies, and includes a wider scope of irregularly or 
assymetrically shaped objects. For a more detailed treatment 
and definition of Gin i, as well a s a d escription of the term' s 
origin in economics (Gini 1912), see lAbraham et all (l2003h . 
In its original context dAbraham et al.l 12004). Gini is calcu- 
lated from pixels lying within a set of quasi-Petrosian radii 
(unique to each object), giving the best 2D morphological 
analysis needed for galaxy evolution studies. At faint magni- 
tudes (iAB > 24.5), this approach to calculating Gini breaks 
down, and compact objects have much lower G than their 
bright counterparts due to an inclusion of background noise 
within a more extended Petrosian radius. Since this study re- 
quires a clean separation of resolved and unresolved sources, 
we have altered the Gini computation so that the Petrosian ra- 
dius is not adjusted from object to object — this makes the 
decrease in G with fainter magnitude not as severe. The be- 
havior of G with magnitude may be seen in Figure [2] The 



4 



Casey et al. 




FIG. 2. — The behavior of the Gini coefficient (G) applied to the COS- 
MOS/ACS i-band mosaic as a function of magnitude. The overall decrease in 
Gini with magnitude is largely due to observation bias, i.e. fainter sources are 
found to be more extended since there is less contrast with t he image back- 
ground . Altering the original Gini calculation described by Abra ham et al] 
12003) algorithm (so as not to use adjustable quasi-Petrosian radii) gives 
cleaner separation between unresolved sources (across the top), and resolved, 
extended sources (the bulk of the objects with low G). Gini is therefore use- 
ful to reject extended galaxies and retain unresolved or marginally resolved 
stars and AGN. The solid line indicates the G selection criterion adopted later 
in the paper for low redshift AGN (with candidates chosen to lie above the 
line), while the dashed line indicates the more conservati ve b ound ary u sed 
for intermediate and high redshift AGN selection (see Q i|5.1l and §5.2\ . 
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FIG. 3. — The type 1 AGN training data's magnitude as a function of 
spectroscopic redshift. All data points are in the set of 268 type 1 AGN 
with color and morphology information (as described in Table 2), and the 
different symbols re present the three parent data sets: COSM OS AGN survey 
(Trump et al] |2007), SDSS overlap with the COSMOS field (Richards et al. 
120051), and add itional spectroscopic follow up of SDSS sources on MMT 
IPrescott et alJ2006D . 

strip along the top corresponds to unresolved sources (stars, 
compact AGN) while the large population with low G corre- 
sponds to galaxies. Clearly, Gini is most useful to distinguish 
well-resolved galaxies from unresolved or partially resolved 
AGN galaxies. 

2.3. AGN Training Data 

Table |2] gives details on the AGN training data, the differ- 
ent sets of data they originate from, and the number counts 
of type 1 AGN, AGN included in the CMC and type 1 AGN 
included in the CMC. Since the training data sources over- 
lap, the reader should refer to Table [2] for a breakdown of 
training data sources throughout this subsection. Figure [3] 



shows magnitude versus redshift for 268 training set AGN 
(those training AGN for which we have both color and mor- 
phology information— see [2]). With 1450 total spectroscop- 
ically observed AGN targets in the COSMOS field, 292 of 
which are type 1 AGN, we are able to infer colors and mor- 
phology as a function of redshift to characterize and calibrate 
the AGN candidate population. The AGN training data come 
from four sources: an X-Ray or Rad io-selection from the 
COSMOS spectroscopic AGN survey dTrump et al.l2007l) . the 
SDSS optical selection wi th confirming spectro scopy overlap- 
ing the COSMOS field dRichards et al.lr2002l) . and spectro- 
scopically confirmed SDSS optical target s from observations 
on MMT/Hectospec dPrescott et al.ll2006l) . The observation 
details on each training data set are given in the following 
paragraphs. Since the training data sets do overlap, each set 
is described by the number of unique objects that were not 
included in pr eviously described training data sets (starting 
with data from Trump et al. 2007, see Table|2]row 2). We do 
not use type 2 narrow line AGN in this paper since their op- 
tical colors have larger variations due to lower emission flux 
and obscuration. We anticipate that the candidate objects will 
mostly be type 1 AGN since at these magnitudes (i ~ 24) 
we need strong, broad emission features and a non-thermal 
continuum for identification. 

The X-Ray/Radio selected sources (limited by j < 
23) come from the first spectroscopic obse rvations of the 
COSMOS AGN Survey dTrump et alJ 120071) using the In- 
amori Magellan A real Camera & Spectrograph (IMACS, 
iBigelow et al. 1998) on the Magellan (Baade) Telescope. The 
first year of observations yielded 284 AGN that were given 
spectroscopic redshifts, 115 of whic h were originally radio 
sources and 169 w ere X-Ray sources (Schin nerer et al.|[2007t : 
Brusaet al. 2007). In a second round of observations, 1050 
more AGN were spectroscopically confirmed in observations. 
Type 1 AGN are likely 90% complete to i AB < 23. The 
survey had 72% targeting yield (the percentage of candidates 
that are actually AGN) down to iab — 24, and a much bet- 
ter yield, > 90%, for %ab < 22. A small subset of the ob- 
served targets was difficult to classify, but the majority was a 
variety of type 1 and type 2 AGN. All together, 1334 AGN 
were spectroscopically observed, 200 of which are type 1 
AGN included in the CMC (see Table |2). The intrinsic se- 
lection bias between X-Ray/Radio selected objects and opti- 
cally selected objects is valuable to investigate; while working 
at much fainter magnitudes than the faintest SDSS optically- 
selected QSOs (i = 21), we expect to incorporate the same 
optical selection biases. Including the X-Ray/Radio objects 
gives a relatively unbiased or independent sample of the opti- 
cal properties of the true AGN population, takes the training 
set to fainter magnitudes (i = 23) than their optically selected 
counterparts, and may include more extended well-resolved 
AGN galaxies which are rejected for optical selection. 

The SDSS sample (limited by i < 21) comes from the over- 
lap region of SDSS on the COSMOS field (from SDSS DR1), 
originally ta rgeted and selected e ither optically or as X-Ray 
sources (see iRichards et all 120021 for selection details). Of 
the 86 spectroscopically targeted objects (75 unique objects), 
51 are type 1 broad line AGN. These sources are primarily 
well resolved and bright (i < 21) . An additional 119 objects 
were optically selected as QSOs ( IRichards et al.ll2002l [2004b 
with high confidence (90%), but only 3 of these objects do not 
overlap with all other spectroscopic data so were not included 
in analysis (including observations from lPrescott et al1l2006l 
which are described in the following paragraph). 
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TABLE 2 
AGN Training Data 





Trump 


SDSS 


Prescott 


TOTAL 


USE 


(1) 


(2) 


(3) 


(4) 


(5) 


(6) 


All Objects 


1334 


86 


94 


N/A 


N/A 


Unique Objects 


1334 


75 


38 


1450 


Color on all types of AGN a 


Type 1 AGN 


200 


51 


38 


292 


Color on Type 1 AGN 


AGN in CMC 


1334 


41 


31 


1406 


Color and Morph on all types of AGN" 


Type 1 AGN in CMC 


200 


37 


31 


268 


Color and Morph on Type 1 AGN 



NOTE. — The AGN Tr aining Data is broken down by source catalog ITrump et al. 2007; Richards et al. 
l2002tlPrescott et al.H2006D , and by type (all AGN, type 1 AGN, AGN in CMC and type 1 AGN in CMC). 
Column (1) describ es the type of AGN, column (2) represents objects from the COSMOS AGN Survey 
iTrump et al. 2007), column (3) represents objects observed by SDSS iRichards et al. 2002, 2005) with 
confirming spectroscopy, and column (4) represents objects observed by Prescott et al. ( 2006). The total 
number of AGN of each type are given in column (5) and their use in our analysis is given in column (6), 
e.g. the most useful set has both color and morphological information for type 1 AGN and contains 268 
objects (the redshift magnitude distribution of these objects is seen in Figureff). 
11 These sets were not analyzed or used in this paper since they include type 2 AGN. 



There are 94 spectroscopically confirmed quasars (38 
unique objects) in our training set observed with the MMT 
6.5 m telescope and the Hectospec multiobject spectrograph 
dPrescott et al.ll2006l) . The original 336 targets were marked 
with quasar flags drawn from the SDSS DR1 catalog, de- 
scribed by the previously discussed SDSS multicolor quasar 
selection algorithm. Eighty out of the 94 quasars did not ap- 
pear in previous follow-up confirmation studies. The quasars 
span a range of magnitudes 18.3 < g < 22.5 and redshifts 
0.2 < z < 2.3, and the results from this study support the 
lower limit of the quasar surface density from SDSS color se- 
lection of 102 AGN per square degree down to g = 22.5 over 
the entire COSMOS field. 

2.4. Narrow Emission Line Galaxies 

Additional obse rvations on MMT/Hectospec from 
iPrescott et alj (|2P06) give 168 narrow emission line galaxies 
(NELGs) in the COSMOS field-objects that were originally 
tagged as probable AGN from SDSS color selection but 
were found to be NELGs in spectroscopic follow-up. Since 
these objects share the same colors as AGN, this set acts 
as a control for blue galaxies used to understand galaxy 
morphology and necessary components of the morphological 
selection design (see §4). They span redshifts 0.2 < z < 2.3 
and magnitudes 18 < i < 22.5. These NELGs are used 
exclusively to understand contaminants and probe selection 
efficiency. 

2.5. X-Ray Sources 

From an original set of 186 5 X-Ray point source s 
(XRPS) in the COSMOS f ield |Bnmet~al] d2006l l2007h : 
lHasinger et afl (I2006L 120071) : ICappelluti etail (l2007l) . the set 
is narrowed down to 1073 objects who have 98% confidence 
that their optical identificati on is secure, and are contained in 
the CMC (Brusa et al. 2007). They are used in this paper as a 
test set and are treated separately from the spectroscopically 
confirmed training set fr om Q2.3\ The XRPS were not spec- 
troscopically targeted by ITrump et alJ (J2007) either because 
they were too faint for IMACS targeting, they were not allo- 
cated a slit during observations, or they lay outside regions of 
the 2-degree field targeted with IMACS to date. This sample 
is less useful in formulating the algorithm designs despite its 
large numbers. In terms of both colors and Gini, the optical 
counterparts to optically faint XRPS show a wide range of 



properties and many are low luminosity, low redshift Seyfert 
galaxies. Select spectroscopy reveals that 50% are Type 1, 
33% are Narrow Line Type 2, and 17% are ellipticals. We 
return to this sample at the analysis stage to assess the selec- 
tion efficiency and completeness, and we use it roughly in the 
discussion of the high redshift selection procedure (see 35 .2b , 



2.6. AGN Templates 

The training set described in 32.31 gives mean AGN col- 
ors out to z spec ~ 3. Since we intend to targe t AGN out 
to z ~ 6, templates developed by Bud avari et alj d2001l) are 
used to infer colors at higher redshift. Bud avari et alJ devel- 
oped four type 1 AGN templates. Rather than characteriz- 
ing physical differences between AGN, these portray four op- 
timal/empirical fits to observed type 1 SEDs. We compare 
the template color predictions with the training data and use 
the best fit template to predict AGN colors at higher redshift 
(which will be shown later in Figure where the training 
data run out. Although there is significant variation in color 
between templates (up to Am ~ 0.5), there is also intrinsic 
spread in AGN color about the mean (as seen by the train- 
ing set objects, a ~ 0.3 mags), so the particular choice 
of template is not critical. This observed color variance is a 
function of magnitude and thus also of redshift, but we as- 
sume for simplicity that the spread about the best fit template 
is a = Am = 0.3. The use of the AGN color templates will 
later be depicted graphically in Figures [6] (AGN color with 
redshift), [7] (galaxy colors with redshift), [12] (template track 
in V — i vs. Gini), andQ~3](the 2D color selection for z > 2.5 
candidates with template tracks overplotted). 

2.7. Stellar Surface Density 

A recent st udy of the s t ellar populations in the COSMOS 
field done bv lRobin et alj d2007l) uses HST morphology cou- 
pled with detailed stellar SED fits to identify stars with 90% 
completeness at i — 27.0. Their estimate (later described 
as the 'strict' SED fit) of the COSMOS stellar population 
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pie is useful when assessing the selection algorithms' com- 
pleteness and efficiency relative to contaminating stars. Their 
methodology identifies point sources via magnitude and cen- 
tral surface brightness (the MU-MAX SExtractor parame- 
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FIG. 4. — The results of stellar selection using central surface brightness 
(CSB) and half-light radius (RHO), the intersection of t hose two tec hniques, 
and comparing the stellar surface densities to those of the Robin et al. (2007). 
All three methodologies agree roughly with Robin's "strict" SED method 
shown here (the CSB selection working best) while Robin's "loose" SED fit 
method is an order of magnitude higher at the faint end, which w e assert can- 
not be representative of the true stellar surface density. While the Ro bin et all 
1 2007) method presumably isolates stars, such objects are not excluded from 
the set of candidate AGN since their prior identification as stars cannot be 
certain. 

ter). We support the " strict" SED method outlined in their pa- 
per (Ro bin etal.l l2007) since the number counts procured by 
the SED method agree with our more crude estimation of stel- 
lar counts identified solely through an identical morphological 
point-source identification. Robin's "loose" SED restriction 
on the quality of the SED fits results in much greater stellar 
density counts (by a factor of ten at the faint end) and largely 
disagree with other observations of star counts from the liter- 
ature. They include the "loose" SED fit data to their study to 
demonstrate the difficulties of star/galaxy separation and gra- 
dient of possible separation methods. Our point-source iden- 
tification is done in three ways: (1) magnitude and central sur- 
face brightness (denoted CSB), (2) magnitude and half-light 
radius (denoted RHO), and (3) the intersection of those two 
methods. As seen in Figure [4] all of these methods produce 
roughly the same stellar surface densities as the "strict" stellar 
SED fitting method. While there is the possibility that many 
of the AGN we are trying to target might be mislabeled as 
stars in this set, the number counts of stars substantially out- 
weighs the number of possibly selected AGN. Since we do 
not use this stellar set to precisely predict colors apart from 
AGN (save the rough preliminary estimates shown in Figure 
|5]l and instead use it to predict stellar number counts, the in- 
clusion of AGN is countably negligible. In [j6] we pass this 
population (identified in both the CMC and CPC) through our 
AGN selection filters to determine the level of stellar contam- 
ination as a function of magnitude, which leads to estimates 
of efficiency and completeness. 

3. COLOR SELECTION 

Th e use of colors here diffe rs from previous practice 
(e.g. iRichards et ail 120021 12004 in that efficient selection 
is possible without incorporating every available band into 
the criteria. Since quasars exhibit power law continua 
with a strong UV excess, the choice of u — B to se- 
lect lower redshi ft objects is well motivated and histori 
cally successful dSandage & Wyndham] 1 1965b iKoo & Kroi 
1982tlWarren et al.ll99UlHewett et all 1 9951; lHall et al.lll99' 




FIG. 5. — Colors of the training set of 292 AGN (solid gray) a nd 1791 
bright stars in the COSMOS field as identified by Robin et al. (2007) (lined). 
The distributions are normalized to equal numbers to more easily show rela- 
tive colors. To select z < 2.5 objects with the lowest stellar contamination 
rate, a short blue baseline is used (i.e. u — B, u — V), not only because of 
the clear separation but also because these are the deepest bands, optimal for 
lower luminosity AGN. As is discussed in the selection portion of the paper 
(ij5), the division between AGN candidates and stars is taken at u — B = 0.67. 

additional information from redder baselines for low redshift, 
like B — V and V — r, does not improve selection efficiency 
on the training set, as discussed in H5.l\ In contrast, interme- 
diate and high redshift selection requires a more sophisticated 
approach since no single color is ineffective in distinguish- 
ing stars and AGN. The optimum 2-color choice for both in- 
termediate and high redshift selection, B — V and V — i, 
considers both the depth of the bluer bands, and the need 
to look towards the red bands for high-z candidate objects. 
By selecting subsets of the catalog that represent stellar and 
galaxy populations and investigating their colors as a func- 
tion of apparent magnitude, we conclude that the color selec- 
tion method will be uniform across the entire magnitude range 
(18.0 < i AB < 24.5). 

3.1. AGN Colors with Redshift 

The colors of AGN as a function of redshift are illustrated 
in Figure [6] for our three primary colors, u — B, B — V and 
V — i. However the spread of AGN color (from the 292 type 
1 AGN) is consistently large in each color (Am ~ 0.3). Our 
low redshift object selection declines rapidly in efficiency at 
z ~ 2.4 where the mean u — B becomes significantly red- 
der and crosses the stellar locus as Lya emission enters the B 
band. This is the natural boundary of the low redshift selec- 
tion. A similar reddening happens at slightly higher redshift 
in B — V; however, this band adds selection power because 
AGN are redder than the contaminating stars for z > 4. An 
even redder baseline, V — i, shows a much flatter shape as 
a function of redshift, and is used together with B — V to 
optimize sele ction of high redshift, faint candidate AGN, as 
described in ^5.21 



Croom et alj 120011: IRichards et al.ll2002l) . Incorporating the 



The train ing data agree broadly with the four type 1 tem- 
plates from Budavar i et al.l d2001l) up to the limit of the data 
around z = 3, with the exception of the lowest redshifts in 
u — B, where the AGN are redder than all templates. Figure 
[6] shows as a solid curve the template that best fits the data, 
which is used for the predictions of AGN colors at z > 3. 
Colors at high redshift are inevitably uncertain. At z ~ 4.5, 
the Lyman limit passes through both u and B bands, render- 
ing nearly zero flux in both filters and causing a sharp drop in 
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FIG. 6. — AGN colors as a function of redshift. The 292 type 1 training 
AGN (selected as defined in the text) are small dots, the diamonds represent 
the mean of the training colors (and a \a spread) taken in Az = 0.2 redshift 
bins, and the four lines (3 dashed, 1 solid) are the colors predicted from tem- 
plates. The solid line is the preferred template, as judged by deviation from 
the mean training colors. Although the training data only extend to z = 3, 
template colors can be used to anticipate high redshift AGN colors. These 
colors rise for z > 3 as Lyman-a emission and then the Lyman limit passes 
from u-band to B and the redder bands. A sharp drop occurs in the predicted 
u — B color at z ~ 4.5 as the Lyman limit passes through the B band, 
rendering near zero flux in both bands. 

predicted u — B. The usefulness of each color will become 
more apparent as we consider the contaminating populations. 

3.2. Colors of Stellar Contaminants 

At magnitudes brighter than 21, stars are the primary con- 
taminant in AGN selection. Our goal is to classify and choose 
AGN candidates at all ranges of magnitudes (18.0 < i < 
24.5), so it is important to quantify stellar colors since stars 
cannot be distinguished from compact AGN morphologically, 
and they consist of about 10% of the catalog even at the 
faintest levels. 

We characterize the contaminating stellar population at 
bright magnitudes (i < 19) to eliminate effects from large 
photometric errors at faint mag nitud es. This population, a 
subset of the stars described in £ 12.71 can be used in lieu of 
the entire star population since we have confirmed empirically 
that the stellar colors do not change or redden inherently as a 
function of apparent magnitude (for this fixed galactic lati- 
tude and assuming low photometric error). This sample has 
1791 stars— sufficient to understand color distributions. Fig- 
ure[5]shows colors of AGN and stars, indicating which colors 
are useful in distinguishing the populations. This reaffirms 
u — B as the best discriminator between the two populations 
at redshift z < 2.5. 

3.3. Colors of Galaxy Contaminants 

Fainter than i ~ 21 (where our selection focuses), the 
overwhelming majority of objects are galaxies, and therefore 
galaxies are the m ajor of contam i nant of the AGN population. 
As de scribed by IScoville et alj (|2007) and Mobash er et alj 
J2006h . all sources are fit by Hubble type galaxy SEDs, yield- 
ing photometric redshifts in the range < z < 3. For the 
sources with the most reliable photometric redshifts (given by 
X 2 < 25, only the best — 13% of the CPC), we investigate 
color as a function of redshift for the primary contaminants: 
starburst and spiral galaxies. Although elliptical galaxies can 
theoretically be confused with high-z AGN because they are 
compact and red, they are statistically rare and only present in 
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FIG. 7. — The primary color contaminant for the AGN sample are galaxies 
with young stellar populations: spirals and starbursts. Objects identified as 
starbursts (COSMOS catalog notation T phot = 5,6) with good confidence 
(X 2 < 25) are shown here in (u — B),(B — V), and (V — i) colors as a 
function of photometric redshifts, compared to the preferred AGN templates 
from Figure[6](solid lines with lcr range as dotted lines). Spiral galaxy colors 
are similar at all redshifts. While there are small windows in redshift where 
the galaxy colors differ from AGN, there is no way to incorporate this into the 
selction method since there is no prior redshift indication for our candidates. 

the catalog in significant numbers at the most recent epochs, 
z < 0.4. Since AGN in this low redshift range are much bluer 
than the red, compact ellipticals, we can easily reject ellipti- 
cals. It is worth noting that COSMOS observations of galaxy 
color are available for < z < 3, but photometric redshifts 
are not reliable at higher redshifts. 

Figure [7] shows color u — B as a function of photometric 
redshift (for objects with \ 2 < 25 in the CPC) for starburst 
galaxies. Spirals exhibit very similar colors with a slighly 
higher overall variance. To clearly understand the level of 
contamination with AGN, we have overlaid the best fit AGN 
template from Figure [6] with its 90% confidence interv al il - 
lustrated by the dotted lines (determined previously in £ 12.61 ). 
Unlike the case for the stellar population, there is little dif- 
ference in color between AGN and starburst galaxies, which 
is why we add morphological information as the basis of our 
low redshift selection technique. 

We considered the possibility of targeting very faint 
(24.5 < i < 25.9) AGN candidates (which due to their faint 
magnitudes are not included in the CMC) using only color in- 
formation, assuming that the only statistically significant con- 
taminants are faint blue galaxies. This could work only if 
AGN colors and galaxy colors varied over < z < 3 or if 
two significantly distinct colors (one of them being u — B to 
target UV excess at low redshift) showed strong separation be- 
tween these two populations over smaller but identical spans 
in redshift. Unfortunately neither of these criteria is satisfied, 
so color selection is not effective at extremely faint magni- 
tudes. We also attempted to target i > 24.5 objects using 
the image FWHM from ground-based data, a less sensitive 
morphological discriminator than Gini calculated using ACS 
data. However, most faint objects regardless of their classi- 
fication by SED as stars, galaxies, or AGN have unresolved 
profiles with 2.0" < FWHM < 2.5". Since it is clear that 
the HST morphology is needed to target faint candidates, we 
limit our selection to targets included in the CMC. 

4. MORPHOLOGICAL SELECTION 

The goal of using morphological selection is to distinguish 
the predominantly compact, centrally concentrated AGN from 
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FIG. 8. — Gini Coefficient as a function of redshift for 268 type 1 AGN 
(with both morphological and color data) shows that the majority of the train- 
ing set of AGN, particularly at z > 2.0, have G > 0.8, indicating essentially 
unresolved sources. At lower redshift, a large spread in G motivates a lower 
constraint on G (also a function of magnitude as shown in Figure |2). Sig- 
nificant color contamination by galaxies makes it impossible to recover any 
well-resolved AGN galaxies with G < 0.65. 

the typically more extended galaxies that dominate the faint 
reaches of the catalog. As described in i!2.2l the Gini coef- 
ficient is a non-parametric measure of source concentration, 
independent of potential asymmetries or of the nature of the 
radial profile. The Gini coefficient presents particularly strong 
leverage when targeting the slightl y resolved AGN galaxies; 
other methods (e.g. see [Abraham et al. 2003, for concentra- 
tion index) require an assumed central pixel and PSF model 
while Gini simply distinguishes the brightest pixels from the 
much fainter extended component and is insensitive to the 
spatial arrangement of those pixels. 

Figure |8] shows the Gini coefficient as a function of redshift 
for the 268 type 1 AGN that had confirming spectroscopy in 
the training set. While some low and moderate redshift AGN 
are well resolved with 0.3 < G < 0.7, the majority (~70%), 
particularly for z > 1, are unresolved with G > 0.8. Since we 
have chosen to divide our selection algorithm into low, inter- 
mediate and high redshifts, we will use different Gini criteria 
for the two regimes, following the behavior illustrated in Fig- 
ure [8] Although it is difficult to use models to predict the be- 
havior of Gini with redshift (due to cosmic evolution and the 
wide range of host galaxy properties), we already know that 
AGn are largely unresolved for z > 1.5. The Gini coefficient 
will naturally increase with redshift due to the diminishing 
contribution from the host galaxy, particularly in the i band; 
the 4000 A break passes the i band at z ~ 1. 

Figure [2] showed that Gini effectively separates unresolved 
stars from galaxies. Also with the added information from 
Figure[8]that most AGN are unresolved or marginally resolved 
in terms of Gini, Figure [2] shows the cuts made in Gini and 
magnitude to select AGN. At faint magnitudes, unresolved 
sources have lower G (the right end of the arc), so to in- 
clude them we drop the lower limit of G to 0.65 as shown. 
At brighter magnitudes, we allow more extended or resolved 
sources in our low redshift candidate pool (all objects above 
the solid line), but we set a more stringent selection for high 
redshift candidates at higher G (all objects above the dashed 
line) assuming high redshift AGN are less resolved than their 
low redshift counterparts. This distinction probably has only a 
small effect on high-z candidate selection (since for i < 22.5, 
high-z AGN are rare). 



FIG. 9. — The behavior of the Gini coefficient with magnitude, as shown 
in Figureff] but with contours replacing the scatterplot. The stellar locus and 
the region occupied by galaxies are well separated. This plot has the addition 
of the AGN training set (crosses) and spectroscopically confirmed Narrow 
Emission Line Galaxies found using SDSS quasar color selection methods 
(diamonds). The morphological criteria adopted for this survey (shown by 
solid and dotted lines) exclude most NELGs while including most of the 
training AGN. 

The set of 168 NELGs from iPrescott et all d2006l) support 
the previously described morphological selection boundaries. 
The majority of the 168 NELGs (96%) are well-resolved with 
G < 0.75. Only 7 are accepted as AGN candidates by the low 
redshift Gini criteria (4%) and only 1 is accepted by the high- 
z criteria, with G > 0.8 (< 1%). Figure |9] indicates that the 
majority of AGN color contaminants will be cleanly separated 
from AGN via morphological selection. This plot shows the 
same data as Figure [2] (simplified to contours) but also over- 
plots the training AGN (crosses) as well as the NLEG contam- 
inants (diamonds). We cannot use this result in a quantitative 
way because the statistics might be intrinsically different at 
fainter magnitudes, where NELGs may be less well-resolved 
and so more often confused with AGN. 

5. AGN SELECTION 

The primary goal of targeting the AGN population is to un- 
derstand the nature of low luminosity AGN evolution and spa- 
tial distribution. The AGN selection strategy can be judged 
in terms of targeting efficiency and completeness in recov- 
ering the predicted population. Assessing the efficiency and 
completeness of our algorithm depends on prior knowledge of 
the QLF, while also requiring additional information on the 
surface density of contaminating stars and galaxies down to 
the limiting magnitude of the AGN candidates. AGN number 
counts have already been discussed inj|| and a good estimate 
of the stellar population is given in 8 12.71 but the galaxy con- 
tamination is the most important and unfortunately the most 
difficult to assess. Defining the AGN selection algorithms is 
deeply dependent on the ability to estimate the efficiency of 
selection techniques. We used an iterative process where our 
selection strategy determines efficiency estimates (in this sec- 
tion, based solely on the training set) and efforts to improve 
efficiency would alter selection methods. In the following two 
subsections we describe how we define our selection proce- 
dure with the motivations guiding our decisions. 

5.1. Low-z AGN Selection 

The AGN population at z < 3 has been well studied to 
%ab < 22.5, but we can target AGN candidates down to 
%ab = 24.5, using G and u — B. As previously discussed, 
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coupling these two parameters gives a clear advantage by sep- 
arating galaxy, AGN, and stellar populations. In Figure [Tol 
galaxies largely have G < 0.75 (with starbursts at the bluer 
end of the distribution), stars are compact (G > 0.8) and are 
relatively red with (u — B) >~ 0.7 and the training set AGN, 
represented by diamonds on FigureQI)] are generally compact 
(high G) and blue (it — B < 0.7). Table[3]shows a flow chart 
of the low-z selection algorithm, wh ich i s described below. 

In this subsection (restricted to ^5. lb . we define the sur- 
vey's completeness and efficiency in terms of the AGN train- 
ing set. Completeness is the fraction or percentage of training 
set AGN recovered by the selection criterion, and efficiency is 
the number of AGN recovered or selected relative to the num- 
ber of star and galaxy contaminants. Depending on our effi- 
ciency and completeness goals we can vary the cuts in G and 
(u — B) to isolate AGN candidates. The core region we use 
for the candidate AGN pool is bounded by 0.75 < G < 1.0 
and —0.5 < u — B < 0.6. It contains 51% of the train- 
ing data which is therefore an approximation to the selec- 
tion completeness for z < 3. We choose 0.75 as a lower 
limit on Gini (rather than the more stringent choice of 0.8) 
because there is a moderately-sized population of 20 AGN 
with 0.75 < G < 0.80, and Figure [8] shows that the disper- 
sion of G at low redshift is high enough to warrant a lower 
boundary. The objects with marginally high G could be at 
the Seyfert/QSO boundary with visible hosts. Figure |2] high- 
lighted the Gini selection criteria for candidates: the accep- 
tance region for low redshift objects is above the solid line. 

Another strategy would be to accept candidates with u — 
B < 0.1 and all values of G, to recover some of the bluest 
and most well-resolved, low-z AGN from the training set. Al- 
though this added region does recover 7 training set AGN, 
there is a sharp increase in number of contaminants (resulting 
in a 150% increase in number of candidates) as the selection 
skirts the blue edge of the large galaxy population (primar- 
ily starbursts). Since the anticipated gain in completeness is 
small, only 2%, we do not include this region in AGN candi- 
date selection. 

The final aspect of the low-z selection is to set an upper 
bound on u—B. While AGN and stars appear to separate most 
cleanly for u - B < 0.6, many AGN have 0.6 <u-B < 0.9, 
a region that overlaps hot main sequence stars and white 
dwarfs. While including this area would increase training set 
completeness from 51% to 61%, the sharp increase in over- 
lap with stars would dramatically reduce the efficiency. The 
choice of (u — B) max — 0.67 was made by defining and con- 
trasting the training set completeness and efficiency. We mea- 
sure completeness for a given (u — B) max as the fraction of 
training AGN within the bounds of our selection criteria out of 
the total 292 training AGN included in the sample. With the 
selection criteria in terms of Gini and magnitude as in Fig- 
ure|2] and u — B < (u — B) max , the completeness increases 
as we increase (u — B) max . The efficiency (also a function 
of (u — B) max ) is measured as the number of training AGN 
selected over the total number of candidates accepted by the 
algorithm. The total number of candidates accepted includes 
the training set. These limited definitions of completeness and 
efficiency are distinct from the expected efficiency or com- 
pleteness of the overall survey, which will be discussed in ^\ 

The region in question (0.6 < u — B < 0.9) contains a 
sizable fraction of the 2.3 < z < 3.0 AGN which are histori- 
cally difficult to target. Figure QT]shows a plot of the training 
set completeness and fractional training set efficiency as func- 
tions of (u — B) max . The training set completeness increases 



with (u — B) max while the efficiency decreases. Their inter- 
section at (u — B) max = 0.67 defines the best upper bound 
on low-z color selection. 

Noting the evolution of AGN color as a function of redshift 
(Figure[6]l, we see that beyond z ~ 2.5, u — B for most AGN 
reddens very quickly (entering the stellar locus), and this se- 
lection is no longer effective. This foreshadows why the high- 
z selection algorithm must use more than one color. The solid 
line outlines our selection criteria for the brightest candidate 
objects (iab > 22.5), and the dashed line extends the region 
to lower G for fainter candidates only (the Gini-magnitude se- 
lection for low redshift objects is illustrated in Figure|2]by the 
solid line). The low-z selection algorithm recovers 169 of the 
original 292 training AGN (58%), and yields a total of 2201 
candidates across the 2 deg 2 COSMOS field. 

5.2. Intermediate-z and High-z AGN Selection 

Unlike the low-z selection method, no single color can 
effectively be used to target z > 2.5 AGN; strong con- 
tamination by the stellar locus makes that impossible. At 
z > 3, there are no training data of spectroscopically con- 
firmed AGN, so we rely exclusively on the AGN template 
predictions illustrated in Figure [6] and number counts from 
the X-Ray point sources with high G (there are 444 above 
the high-z line in Figure |2j. Since the X-Ray sources are not 
guaranteed to be AGN (knowing only ~50% are type 1), their 
use in designing the selection algorithm is loose, and only im- 
plimented as a guide supplimenting the use of templates. We 
group the intermediate redshift (2.5 < z < 3.0, dubbed int- 
z) selection together with the high redshift selection (z > 3) 
since they use the same variables and act as two subsections 
of a larger selection technique. Below we describe this over- 
all technique, and split into the two redshift regimes when it 
is clear that the separation is needed. 

To select objects with z > 2.5 we need more color informa- 
tion than was used to define our low redshift algorithm. Figure 
[T2]shows the expected V — i color of high-z AGN, along with 
the same galaxy and star populations as shown in Figure [Tol 
Overlap with the stellar locus is severe in the redshift range 
we are targeting, and does not let up until z ~ 5. Although 
we could avoid this problem by using an even redder baseline 
(e.g. V — z, r — z, or z — K), the limited depth of the catalogs 
and the photometric errors in these bands make faint, high 
redshift AGN selection impossible. Instead, we incorporate 
a second optical color, B — V, which goes much deeper than 
the redder bands (refer to TableQ]for limiting magnitudes) and 
which, when coupled with V — i, shows promising separation 
between the stellar locus and AGN template color predictions 
for z > 2.5. To step through the stages in the int-z and high-z 
selection the reader should refer to Table [4] 

The goal of the selection algorithm is to define the opti- 
mal AGN color domain without accepting significant num- 
bers of stellar contaminants. At high redshift, AGN are gen- 
erally redder, fainter, and more likely to be unresolved. This 
final assumption is based on both our small training set be- 
havior, and the physical and observational constraints at high 
redshift. Therefore (coupled with data shown in Figure [8}, 
we require G > 0.8 for intermediate and high redshift candi- 
dates. The full high-z G acceptance area is shown in Figure 
|2] as the region above both the solid and dashed lines; these 
objects make up the initial int-z and high-z candidate sample, 
and they are shown in the upper panel of the (B — V)(V — i) 
diagrams in Figure [13] The gray area represents a 90% en- 
velope of all AGN template predictions for z > 2.5 (the red- 
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FIG. 10. — Color and morphological properties can distinguish the majority of AGN from their stellar and galaxy contaminants. In this case, the separation is 
presented in terms of G vs. u — B. The galaxies (thet cloud of small dots to the left) have G < 0.75, the stars (the smaller collection of dots at the top right) has 
high G but primarily u — B > 0.67, and the AGN (diamonds) are more compact than almost all galaxies and bluer than almost all stars. There is a small set of 
well resolved AGN galaxies for which this selection is not effective due to heavy overlap with the blue end of the galaxy locus, primarily starburst galaxies, as 
well as a number of AGN with colors similar to the hottest stars. 



TABLE 3 

Low Redshift AGN Candidate Selection Process . 



Nobj 


Process 


Details 


195706 


Objects in the CMC 




190316 


u, b are detections 




188553 


Brightness criterion 


tAUTO < 25.5 


17697 


Gini criterion 


0.65 < G < 1.0 AND 






[G > 0.75 ORG > -0.067 x i AUTO + 2.25] 


2370 


u — B criterion 


-2.00 < u - B < 0.67 


2201 


X-Ray/Training Data 


Exclude XRPS and Training Sample 



TABLE 4 

Intermediate and High Redshift AGN Candidate Selection Process 



Nobj 


Process 


Details 


195706 


Objects in the CMC 




192230 


Brightness Criterion 


iAUTO < 25.5 


18475 


Gini Criterion 


0.65 < G < 1.00 AND 






[G > 0.80 OR G > -0.067 x i AUTO + 2.25] 


594 


High-z Selection in (B - V)(V - i) 


(V-i) < 1.15 x (B — V) — 0.31 


515 


High-z: Remove Training Data 


Exclude XRPS and Training Sample 


702 


High-z: Add Blue Dropouts 


Include 187 Blue Dropout Objects 


1188 


Int-z Selection in (B - V)(V - i) 


V — i> 1.15 x (B - V) - 0.31 AND [V - i < 0.15 OR 




[V - i < 0.5 AND V - i > 1.4 x (B - V) + 0.1]] 


913 


Int-z: Remove Training Data 


Exclude XRPS and Training Sample 
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FIG. 11. — Low redshift (z < 2.5) AGN selection completeness from 
the training set (solid line) increases as a function of the upper bound, (u — 
B)max- The efficiency from the training set (dashed line) decreases over 
the same range of upper bounds since the region has a high density of hot 
stars. The intersection of these two quantities occurs at (u — B)max = 
0.67 (dotted vertical line), which is chosen as the upper bound on the low-z 
acceptance region. 




0.6 

Gini Coefficient 

FIG. 12. — Selecting from the same catalog as for Figure ITol the use of 
a redder baseline and a similar selection technique for high-z AGN as for 
low-z AGN is inefficient; the overlap of predicted AGN colors (straight lines 
marked with redshift) with the stellar locus is severe. The 90% confidence 
interval on AGN template colors is shown in the upper left. This single color 
technique (e.g. defining the acceptance region by V — i < 0.1 and G > 
0.8) rejects z > 3.0 AGN 60% of the time and admits large numbers of 
contaminating stars. 

shifts are marked by "25", etc.). The central lines indicate 
the two best fit AGN template paths through the color plane 
(best fit lines to B — V and V — i as shown in Figure[6]). The 
bottom panel of Figure [T3] shows the int-z and high-z selec- 
tion methods in relation to template predictions and the stellar 
locus, and are described sequencially below. The area shaded 
by horizontal lines in the bottom panel of Figure[T3lrepresents 
the z > 3.0 AGN population outlined by templates (our high- 
z selection), while the area shaded by diagonal lines (spaced 
widely) represents the 2.5 < z < 3.0 AGN population (our 
int-z selection). The stellar locus is converted into a contour 
plot to schematically show areas of high stellar contamina- 
tion. Compact X-Ray point sources, while not guaranteed to 
be AGN, are overplotted as diamonds to supplement the areas 
highlighted by templates. 

For efficient selection at z > 3 we make a diagonal cut 
in the two color diagram, described by the line V — i < 



1.15 X (B — V) — 0.31 (shown in the plot as a heavy solid 
line). The region below this line contains 594 objects and 54 
X-Ray sources (12% of all 444 X-Ray sources considered for 
high-z selection), and a significant portion of the 90% AGN 
color envelope for z > 3, justifying the criterion for the high- 
z candidate pool. There are 594 high redshift objects in this 
selection area, 13% of which are X-Ray sources. After re- 
moving all X-Ray sources and training data from the selected 
objects, there are 515 candidates for the high-z selection al- 
gorithm. 

The intermediate redshift AGN occupy the area at the base 
of the stellar locus, blueward of the heavy diagonal line defin- 
ing the high-z selection area. Contamination in this region is 
increased greatly by the population of hot main sequence stars 
and white dwarfs on the blue end of the stellar locus. Since 
contamination is expected to be much higher at these red- 
shifts, we treat intermediate redshift AGN candidates separate 
from the high redshift AGN candidates we discussed in the 
previous paragraph. The region bounded by (V — i) < 0.15 
and V — i > 1.15 x (B — V) — 0.31 (the heavy line) consists 
of 814 objects, 93 of which are X-Ray sources (21% of the X- 
Ray sample). The upper limit (V — i) m a x = 0-15 was chosen 
in a similar way to (it — B) max from £15 - 1 1 since the antic- 
ipated contamination rate greatly increases for redder values 
of V — i. We include another region in the intermediate red- 
shift selection by realizing that many X-Ray sources are bluer 
than the galaxy/star locus and that templates predict that in- 
termediate redshift AGN will occupy the V — i < 0.5 and 
V — i > 1.4 x (B — V) +0.1 area. This adds 374 more candi- 
dates and 115 more X-Ray sources (up to 46% of the X-Ray 
sample). The total number of intermediate redshift AGN can- 
didates is 913 after removing X-Ray sources and the training 
set (from an original 1188). The acceptance region for int-z 
selection is shown on the bottom panel of Figure [13] enclosed 
by the heavy dashed line and solid line. 

One final catagory of targets is considered as potential high 
redshift AGN. We have included 187 blue dropouts, where i 
band is a detection, but B band is not. These are interest- 
ing because of their very red color (B — i > 2) and faint 
magnitude {%ab ~ 24.0), which potentially corresponds to 
very high redshift AGN, in the range 3.4 < z < 5.5. There 
are only 638 blue dropouts in the entire COSMOS catalog, of 
which 187 satisfy the high-z Gini cut described in §4] We add 
these objects into the high-z candidate list, bringing the total 
number up to 702. 

6. ESTIMATING POPULATION STATISTICS AND ALGORITHM 
EFFICIENCY 

Because we have not yet carried through with spectroscopic 
observations of our candidates, we cannot directly or reliably 
predict our algorithms' efficiency or completeness. Instead 
we carefully construct a contextual arguement by roughly es- 
timating quasar, star, and galaxy population statistics. By 
comparing our selection technique to other current algorithms 
in the literature, we estimate efficiency at ^30-50% and com- 
pleteness >60%. While detailed runs of Monte Carlo simu- 
lations could be used to estimate this completeness more pre- 
cisely, that is not the focus of this paper. Instead a follow-up 
paper detailing the yeilds of this study will more carefully 
explore our method's robustness in choosing low-luminosity 
AGN or potentially faint high redshift AGN in the future. 

The Quasar Luminosity Function (QLF) has been studied 
extensively over the past decade with an increasing range of 
statistics used to verify some descriptive functional forms, 
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FIG. 13. — B — V and V — i are used to select AGN with z > 2.5, with the addition of a prior cut on Gini shown in Figureffjby the dashed line. Without 
any training data above z = 3.0, templates are used to extrapolate the colors of AGN at high redshift. The two best-fitting type 1 AGN templates (relative to 
low z AGN in the same colors) are shown in the top panel (dashed lines) and marked at redshifts 2.5, 3.0, 3.5 and 4.0 by "25", "30", "35", and "40." The striped 
region is the 90% envelope for potential AGN colors, adopted from the variance in B — V and V — i of the training set AGN (for lack of better information at 
high redshift, variance is assumed to be constant). The bottom panel shows contours converted from the scatter plot in the upper panel, and the template regions 
divided into intermediate redshift (gray shaded) and high redshift (striped) regions. The solid line is the upper bound on the acceptance region of the high redshift 
selection, while the dashed line represents the upper bound of the intermediate redshift selection (bound on bottom by the solid line). X-Ray Point sources are 
used roughly to guide selection, but on these plots their distribution would resemble a scatter plot adding little visual information. 



like the double power law (e.g. | Peil 119951; iPetersonl 11997b 
iBoyle et afll2000t ICroom et alJl2004l) . which may be used to 
generate a prediction of the faint end QLF out to z ~ 6. 
The most useful treatments and observational insights into 
the QLF in the literature turn o ut to be inappropriate for the 
range of magnitu des we need dRichards et all l200l [2006; 
iJiang et all |2006|). We use the pure luminosi t y evo lution 
(PLE) model of iHopkins. Richards. & Hernquistl d2007l) . who 
tied together several data sets for the best faint end reliability, 
where most of our tarets lie. The luminosity-dependent den- 
sity evolution (LDDE) model is also often used to describe 
the QLF, however, at the faintest magnitudes we suspect it 
overestimates the quasar counts by ~2dex and is inappropri- 



ate in this conte xt. T he predicted QLFs (PLE and LDDE) are 
shown in Figure [14F. The predicted AGN number counts for 
the COSMOS field (from these QLFs at various redshifts) are 
shown in Figure [15] with la errors propagated from the error 
in the QLFs. 

Figure [161 gathers all predictive num ber counts fo r QSOs, 
stars dRobin et al.ll2007l) . and galaxies dCapak et al.ll2007l) in 
relation to the number counts of objects in the CMC catalog. 

7 See equations 8-10, 17-20 and Table 3 of IHopkins et alj {200% for the 
details of the PLE treatment, and equations 11-16 and Table 4 for LDDE. We 
chose the "FULL" model (as it is called therein) because it is best across all 
magnitude ranges, and takes all quasar count data from all magnitudes into 
consideration. 
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FIG. 14. — The Quasar Luminosity Function behavior at selected redshifts 
for the P LE model (solid) LDD E model (dotted), and the bright end QLF 
derived by Richards et al. 1 2006) (dashed) from the SDSS DR3 quasar counts 
(crosses). These were calibrated through bolometric corrections to agree at 
the bright end (the realm of SDSS data) with divergence at the faint end. The 
QLF models produce diverse predictions of the number counts of quasars at 
the faint level of COSMOS. 




FIG. 15. — The predicted number counts of quasars in the COSMOS 
field (1.7 square degrees) for the low redshift interval (0.3 < z < 2.4), the 
intermediate redshift interval (2.5 < z < 3.0), and the high redshift interval 
(3.0 < z < 5.5). The PLE quasar number counts are given by the black line, 
with the lcr margin of error as the horizontally shaded regions. The LDDE 
model is represented by the gray line with the lcr diagonal shaded region; at 
high redshifts and at faint magnitudes, the LDDE quasar number count is not 
well constrained and does not have a lower limit. This reinforces the need 
to target faint AGN— so the QLF may be more well constrained in regimes 
where little data exists today. 

At the faint level of this survey, it is well known that AGN 
and stars are minor components of a population that is made 
up primarily of galaxies. The galaxy count (not previously 
discussed) given in the latter reference is based on external 
measures of the galaxy surface density, which agrees with 
several other surveys up to 80% completeness at i = 26.5 s 
This shows the overwhelming statistics of the galaxy popula- 
tion with respect to stars and AGN. Figure [17] shows the stel- 
lar contaminants relative to the total number of selected ob- 
jects as well as predicted QLF AGN densities from the PLE 



8 e .g. the COSMOS F814W Weak Lensing Catalog I Leautha ud et alj 
120071) . the Hawai'i Hubble Deep Field fCapaket al. 2004), Hubble Deep 
Field North Williams et a l. 1996; Metcalfe et al. 2001), Hubb le Deep Field 
South and Herschel Deep Field (Metcalfe et al. 2001), SDSS lYasuda et al] 
l200ll) . Canada France Deep Field IMcCracken et aljTlOOl . and the CFHT 
Legacy Survey (McCracken et al. 20071). 



FIG. 16. — Logarithmic and linear plots of the expected number counts of 
objects in the candidate pool. The solid line represents all objects in the CMC, 
while the triple-dot-dashed line shows the parent distribution of objects in the 
CPC of the same magnitude range (see Figure[T}. The stellar population iden- 
tified by Robin et al. 12007) is shown as the dashed line and at the faintest 
magnitudes, constituting about 10% of the CMC contents. A n estimate of 
galaxy counts is shown as the dot-dashed line and is given in Ca pak et al] 
12007) referencing previous galaxy count work from COSMOS, H-HDF-N, 
HDF-N, HDF-S, Herschel, SPSS CFDF and CFHT jLeauthaud et alJ200l; 
I Capaket alj [2 004; Williams et al. 1996; Metcalfe et al. 20011; lYasuda et alj 
1200 It IMcCracken et al JI20031 120071) . The predicted QLF counts (from Fig- 
ure !15t are shown with appropriate error bars— PLE as the heavy solid line 
and LDDE as the dotted line. The LDDE formulation is inappropriate at the 
faintest magnitudes where it predicts that quasars would constitute the entire 
contents of the catalog. Therefore the PLE formulation of the QLF (of order 
1 % of the CMC number counts) is adopted. 




FIG. 17. — The magnitude distribution of all selected objects (solid line) 
through the different redshift algorithms. The selected stars are shown as 
a dashe d line, wh ich constitute about 5% of the overall stellar population 
from lRobin et all 120071) . and ~50% of all selected objects. The QLF count 
predictions (dotted line) show the AGN numbers relative to the total number 
of selected objects. The contamination from faint blue, compact galaxies can 
be inferred from this information (subtracting stars from the total) but there 
is no reliable method to measure those numbers. 

method. Galaxy predictions are not shown because of their 
huge numbers, thus they cannot be reliably determined. Rel- 
ative to the AGN counts, the stellar contaminants are roughly 
3-4 times more numerous at low redshift and ~10 times more 
numerous at high redshift and hypothetically constitute ^1/2 
of all selected candidates (although given our methodology 
biases against stellar selection, t his i s highly unlikely). 

A recent study bv lSiana et al] d2007l) presents an optical and 
IR selection technique of QSOs at high-z down to i ~ 22. 
When coupled with results from confirming spectroscopy of 
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FIG. 18. — The completeness (top panels) and efficiencies (bottom panels) 
of the algorithms as defined by the X-Ray point sources. Note that the com- 
pleteness represented in each upper panel is the number of selected objects in 
each algorithm divided by 1073, the total number of XRPS, and does not re- 
late to predictions from the QLF. While only 57 XRPS are selected in high-z 
(low completeness for XRPS), the efficiency is quite high. While it only is 
representative of a subset population which likely consists of low-z sources, 
it is valuable in understanding the true effectiveness of each technique. 

— 10 QSOs, they conlude a completeness of 80 - 90% using 
detailed Monte Carlo simulations. This estimate is based on 
the premise that QSOs likely exhibit colors of QSO templates 
and are selected by their paths in color-color space, which 
differ from stellar contaminants. Using this methodology, we 
make a very similar conclusion based on the similarities of our 
techniques: at high redshift (z > 3), our selection algorithm 
as shown in Figure[T3]will have a high completeness (>60%). 
A thorough assessment of this success rate will be included in 
a follow-up paper detailing observational results and yields. 

To test the bounds on completeness and efficiency, we run 
the selection on the XRPS sample population (which is likely 
comprised of 90% AGN, but only -50% type 1 AGN) and 
compute efficiency and completeness f or th is sample. The 
1073 X-Ray point sources introduced in £]2.5I were not useful 
in designing the algorithms, but they may now be used retro- 
spectively to probe the efficiency and completeness. We add 
the selected X-Ray point sources back into the selected ob- 
jects (modification of the last steps in Tables [3] and 0), and 
then compute the efficiency and completeness with respect to 
these X-Ray objects. Altogether, 323 X-Ray sources are tar- 
geted by the low-z algorithm and 203 are targeted in int-z, and 
57 using the high-z technique. The relative completeness and 
efficiency of the algorithms targeting the X-Ray point sources 
may be seen in Figure [18] Low yeilds at faint magnitudes 
are potentially misleading since far fewer XRPS are at such 
faint magnitudes. The completeness rate here must also not 
be misinterpreted— it represents the fraction of the 1073 XRPS 
which are selected by the low-z, int-z, and high-z algorithms. 
Since the XR PS likely c onsist of very few high redshift ob- 
jects (jTrump et al. 2007), the low completeness calculation 
for high-z is expected (upper right panel of Figure [18}, but as 
shown, the algorithm is very efficient in selecting those XRPS 
which are suspected to be high-z sources (lower right panel of 
Figure [18}. The algorithms are fairly successful in selecting 
and targeting such faint optical objects with efficiencies as 
high as 50% and completeness as high as 40%. The X-Ray 
point sources are already known to be probable AGN, but this 
test shows that the selection methodology is able to success- 
fully target AGN with reasonable yield statistics. 



7. CONCLUSIONS 

The method described by this paper aims to probe the faint 
end of the quasar luminosity function via optical AGN se- 
lection; it is framed by complex effects of a dominant con- 
taminating population of faint stars and galaxies. Pushing 
optical selection to this faint level (i ~ 24.5) requires ex- 
tensive knowledge of stellar colors, stellar number counts, 
galaxy color contamination, compact galaxies, AGN color 
and morphology properties, and reliable predictions of counts 
from the evolving quasar luminosity function. This paper es- 
tablishes optical AGN selection methods for the COSMOS 
field (with photometry from ground-based Subaru and CFHT, 
along with Hubble ACS imaging) using data on spectroscopi- 
cally confirmed AGN, X-Ray point sources, AGN color tem- 
plates, and stellar studies done in the COSMOS field. We have 
discussed and accounted for the color of both AGN and con- 
taminating stellar and starburst galaxy populations, the use of 
the Gini coefficient as a reliable discriminant between point 
sources likely to be AGN or stars and extended galaxies, and 
the evolution of both color and morphology as functions of 
redshift and magnitude. While the color of blue galaxies at all 
magnitude levels dominates the AGN contamination, leverage 
from the Gini coefficient can significantly hinder the effect of 
this contamination on unresolved AGN galaxies (while being 
defenseless to select against compact blue galaxies). 

The method of targeting AGN was split into three sections: 
one for low redshift AGN (z < 2.5), one for intermediate 
redshift AGN (2.5 < z < 3.0), and another for high red- 
shift AGN (z > 3.0). The low and high redshift selections 
straddle the redshift regime of 2.5 < z < 3.0 where AGN 
colors resemble those of A stars in every band, and are there- 
fore indistinguishable from stellar contaminants. We design 
a method to target these intermediate redshift AGN, but the 
selection is significantly hindered by increased contamina- 
tion rates when compared to the low-z and high-z algorithms. 
The low redshift algorithm was based on the bluest baseline, 
u — B, and the Gini coefficient. The low-z AGN were iden- 
tified as consistently bluer than most stars and more compact 
than most galaxies. The high redshift algorithm used more 
than one color to identify AGN, adopting B — V and V — i. 
It relied on predictions from AGN templates to predict the 
color properties of AGN, but also used the Gini coefficient 
to eliminate extended sources from the candidate pool. We 
design the intermediate redshift selection as a branch of the 
high redshift selection; it is important to target these redshifts 
since it is known that AGN are more numerous in the range 
2.5 < z < 3.0 than at higher redshift. It is more advantageous 
to design a separate int-z algorithm to accept heavier contam- 
ination from stars than miss this AGN population completely. 

The selection algorithms are designed to maximize both 
completeness and efficiency of selecting faint AGN. Although 
these quantities can only be estimated in advance of confirm- 
ing spectroscopy of selected candidates, the experiment has 
proven successful in its ability to recover a test sample of 
X-Ray point sources (which are known to be —90% AGN, 
and -50% type 1 AGN). With -2700 low redshift candidate 
objects, —1000 intermediate redshift objects and —600 high 
redshift candidates in the 2 deg 2 COSMOS field, the method 
could hypothetically recover —700 low-z AGN, —200 int-z 
AGN, and —200 high-z AGN. As a conservative estimate, 
roughly 2 to 10 candidates will have to be observed to iden- 
tify each new AGN. A total of — 160 candidates have been 
observed at Magellan IMACS and LDSS3 as of May 2007 
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and more observations are planned. 

We would like to sincerely thank the COSMOS team; infor- 
mation on the project is given at the public area of the team 
website [http : / / cosmos . astro . caltech . edu/| We 
acknowledge the staff at Caltech, CFHT, CTIO, KPNO, 
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